22 research outputs found

    Overcoming Exploration in Reinforcement Learning with Demonstrations

    Full text link
    Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

    Domain Randomization and Generative Models for Robotic Grasping

    Full text link
    Deep learning-based robotic grasping has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalization can be a challenge. In this work, we explore a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis. We generate millions of unique, unrealistic procedurally generated objects, and train a deep neural network to perform grasp planning on these objects. Since the distribution of successful grasps for a given object can be highly multimodal, we propose an autoregressive grasp planning model that maps sensor inputs of a scene to a probability distribution over possible grasps. This model allows us to sample grasps efficiently at test time (or avoid sampling entirely). We evaluate our model architecture and data generation pipeline in simulation and the real world. We find we can achieve a >>90% success rate on previously unseen realistic objects at test time in simulation despite having only been trained on random objects. We also demonstrate an 80% success rate on real-world grasp attempts despite having only been trained on random simulated objects.Comment: 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018

    The Identification of Individuals with Disabilities in National Databases: Creating a Failure to Communicate

    Get PDF
    The purpose of this study was to analyze similarities and differences in how students with disabilities are identified in national databases. National data collection programs in the U.S. Departments of Education, Commerce, Labor, Justice, and Health and Human Services, as well as databases from the National Science Foundation, the American Council of Education, and the College Board, were examined. Nineteen national data collection programs were selected as being potentially useful in the extraction of policy-relevant information on the educational status and performance of students with disabilities. Among these 19 programs there was significant variability in the disability catego-ries used. These programs were targeted for two reasons: (a) their potential usefulness in providing indicators of domains in key models of educational outcomes for children and youth with disabilities, and (b) their prominence in current efforts to monitor progress toward the attainment of national education goals. Discussed are issues related to improving disability identification in large-scale data collection programs and the effects of these issues on reporting policy-relevant information

    Conference report

    No full text

    The Identification of People With Disabilities in National Databases: A Failure to Communicate (NCEO Synthesis Report)

    No full text
    A report summarizing findings for policymakers, researchers, and educators that focuses on assessment, accommodations, and accountability in relation to K-12 students with disabilities.The Center is supported through a cooperative agreement with the U.S. Department of Education, Office of Special Education Programs (1990-1995: H159C00004; 1995-2000: H159C50004). Opinions or points of view expressed within this document do not necessarily represent those of the U.S. Department of Education or Offices within it
    corecore